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Abstract 

We consider conditional exact tests of factor effects in designed experiments 
for discrete response variables. Similarly to the analysis of contingency tables, a 
Markov chain Monte Carlo method can be used for performing exact tests, when 
large-sample approximations are poor and the enumeration of the conditional sample 
space is infeasible. For designed experiments with a single observation for each run, 
we formulate log-linear or logistic models and consider a connected Markov chain 
over an appropriate sample space. In particular, we investigate fractional factorial 
designs with 2^"'^ runs, noting correspondences to the models for 2^""^ contingency 
tables. 

1 Introduction 

Exact calculations of p values for statistical conditional tests arise mainly in the context 
of analyzing contingency tables. For example, Fisher's exact test is frequently used for 
evaluating the hypothesis that the row effect and column effect are independent in the 2x2 



contingency tables. Fisher's exact test is generalized to / x J contingency tables in [11 
Traditionally, statistical tests for contingency tables have relied heavily on large-sample 
approximations for sampling distribution of the test statistics. However, many works have 
shown that large-sample approximations can be very poor when the contingency table 
contains both small and large expected frequencies even when the sample size is large. 
See p^], for example. Moreover, coupled with rapid development both in computer power 
and in techniques of algorithms, exact calculations of p values become feasible in various 
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settings for practical use. Consequently, for many types of problems where some ingenious 
calculation schemes are invented, it is unnecessary to use large-sample approximations for 
sampling distributions nowadays when their adequacy is in doubt. A typical example is 
the network algorithm by [l8| for calculating exact p values of Freeman-Halton tests in 
two-way contingency tables. See the survey paper by 

At the same time simulation techniques for estimating p values by Monte Carlo proce- 
dures have also developed. In particular, for the problems where a closed form expression 
of the sampling distribution can not be obtained, Monte Carlo method provide powerful 
tools. Note that, in contrast to the large-sample approximations, we can estimate p val- 
ues in arbitrary accuracy, theoretically, by increasing simulation sizes. However, for many 
models, such as general hierarchical log-linear models in multi-way contingency tables, 
direct generation of random sample is not straightforward. In this case, Markov chain 
Monte Carlo techniques can be used. 

For performing Markov chain Monte Carlo methods for sampling from discrete sample 
space, an important problem is how to construct a connected Markov chain on the given 
sample space. Note that, if an arbitrary connected Markov chain is constructed, the 
chain can be modified to give a connected and aperiodic Markov chain with stationary 
distribution being the desired null distribution by the usual Metropolis procedure ([l^l, for 
example). As for this point, the first breakthrough work is given by j^. The key notion of 

is a Markov 6aszs, which enables to construct a connected Markov chain for arbitrary 
observed data set. [8|] presented a general algorithm for computing a Markov basis in 
the settings of a general discrete exponential family of distribution. Their approach 
relies on the existence of a Grobner basis of a well specified polynomial ideal. After 
[sl, the techniques of the Markov chain Monte Carlo method for sampling from discrete 
conditional distributions are rapidly developed in the past decade. See for example j^, 
U and the works by Aoki and Takemura (|3,0,0,0],(23|'H)- 



In this paper, we consider conditional exact tests in the context of designed experiments 
with counts (or ratios of counts) observations. In most of the classical literatures on 
designed experiments, the responses are assumed to be normally distributed. However, in 
many practical situations, the experimental data are not normally distributed. For such 



non-normal data, the generalized linear models are frequently used. See [13|| or Chapter 13 



of [25|, for example. In these literatures, however, exact testing procedures for non- normal 
data are not considered. Since the experimental design is used when the cost of obtaining 
the data is relatively high, it is very important to develop techniques of exact procedures 
for the case of non- normal responses. Therefore in this manuscript, we consider the exact 
testing procedures for non-normal responses, based on the theory of the generalized linear 
models. For discrete responses, the above background and strategies also apply, i.e., to 
calculate p values for conditional tests, 

1. traditionally, large-sample approximations such as the normal distribution or chi- 
square distribution are used, 

2. if the observed data set contains both small and large expected values, the adequacy 
of the approximation becomes poor. 
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3. if the sample space is of moderate size, or some ingenious algorithms can be used, 
an exact calculation of p values is possible, 

4. if an exact calculation is not feasible, we can rely on Monte Carlo procedure, 

5. if a closed form expression of the null distribution is not given, Markov chain Monte 
Carlo procedure can be employed, if a Markov basis is available. 

The topic we consider in this paper is No. 5 of the above list. In Section 2, we formulate 
conditional exact tests of factor effects for fractional factorial designs. For designed exper- 
iments with a single observation for each run, we formulate log-linear or logistic models 
and consider how to construct a null model to be tested using the theory of generalized 
linear models. In Section 3, we consider Markov chain Monte Carlo tests for designed 
experiments. First we give a definition of Markov bases and a simple algorithm for eval- 
uating p values by the Markov chain Monte Carlo tests in Section 3.1. In Section 3.2, 
we consider correspondences between the fractional factorial designs with 2^^'^ runs and 
models for 2^"'' contingency tables. We end the paper with some discussions in Section 4. 



2 Conditional tests for fractional factorial designs 

In this section, we consider the exact conditional tests for the discrete observations derived 
from fractional factorial designs. We investigate the designs with a single observation for 
each run, which is either a count or a ratio of counts. For the former case we consider 
the log-linear models, and for the latter case we consider the logistic models. Since our 
arguments for the two cases are almost the same, we first explain in detail the log-linear 
case in Section 2.1, and then give only a short description and a remark for the logistic 
case in Section 2.2. 



2.1 Exact conditional tests for log-linear models of Poisson ob- 
servations 

First we investigate the case that the observations are counts of some events. In this case, 
it is natural to consider Poisson models. To clarify the procedures of exact tests, we take 
a close look at an example of fractional factorial design with counts observations. Table 
1 is a 1/8 fraction of a full factorial design (i.e., a 2''"'^ fractional factorial design) defined 
from the aliasing relation 

ABDE = ACDF = BCDG = I, (1) 



and response data analyzed in [Tj and reanalyzed in 13|. In Table 1, the observation 
y is the number of defects arising in a wave-soldering process in attaching components 
to an electronic circuit card. In Chapter 7 of 0], he considered seven factors of a wave- 
soldering process: (A) prebake condition, (B) flux density, (C) conveyer speed, (D) preheat 
condition, (E) cooling time, (F) ultrasonic solder agitator and (G) solder temperature. 
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Table 1: Design 



and number of defects y 



for 



the wave-solder experiment 



Factor y 



Run 


A 


B 


C 


D 


E 


F 


G 


1 


2 


3 


1 


1 


1 


1 


1 


1 


1 


1 


13 


30 


26 


2 


1 


1 


1 


2 


2 


2 


2 


4 


16 


11 


3 


1 


1 


2 


1 


1 


2 


2 


20 


15 


20 


4 


1 


1 


2 


2 


2 


1 


1 


42 


43 


64 


5 


1 


2 


1 


1 


2 


1 


2 


14 


15 


17 


6 


1 


2 


1 


2 


1 


2 


1 


10 


17 


16 


7 


1 


2 


2 


1 


2 


2 


1 


36 


29 


53 


8 


1 


2 


2 


2 


1 


1 


2 


5 


9 


16 


9 


2 


1 


1 


1 


2 


2 


1 


29 





14 


10 


2 


1 


1 


2 


1 


1 


2 


10 


26 


9 


11 


2 


1 


2 


1 


2 


1 


2 


28 


173 


19 


12 


2 


1 


2 


2 


1 


2 


1 


100 


129 


151 


13 


2 


2 


1 


1 


1 


2 


2 


11 


15 


11 


14 


2 


2 


1 


2 


2 


1 


1 


17 


2 


17 


15 


2 


2 


2 


1 


1 


1 


1 


53 


70 


89 


16 


2 


2 


2 


2 


2 


2 


2 


23 


22 
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each at two levels with three boards from each run being assessed for defects. The aim 
of this experiment is to decide which levels for each factors are desirable to reduce solder 
defects. 

In this paper, we only consider designs with a single observation for each run. There- 
fore, in this example, we focus on the totals for each run in Table 1. This is natural for 
the settings of Poisson models, since the set of the totals for each run is the sufficient 
statistics for the parameters. We also ignore the second observation in run 11, which 



is an obvious outlier as pointed out in [13|. Therefore the weighted total of run 11 is 



(28 -|- 19) X 3/2 = 70.5 ~ 71. By replacing 2 by —1 in Table 1, we rewrite k x p design 
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matrix as D, where each element is +1 or —1. Consequently, we have 



D 



1 +^ 


+1 


+ 1 


+ 1 


+ 1 


+ 1 


+1 


+1 


+1 


-1 


-1 


-1 


+1 


+1 


-1 


+ 1 


+ 1 


-1 


+1 


+1 


-1 


-1 


-1 


+ 1 


+1 


-1 


+ 1 


+ 1 


-1 


+ 1 


+1 


-1 


+ 1 


-1 


+ 1 


-1 


+1 


-1 


-1 


+1 


-1 


-1 


+1 


-1 


-1 


-1 


+ 1 


+ 1 


-1 


+1 


+ 1 


+1 


-1 


-1 


_1 

— 1 






_1 
— 1 






-1 


+1 


-1 


+1 


-1 


+ 1 


-1 


+1 


-1 


-1 


+ 1 


-1 


-1 


-1 


+ 1 


+ 1 


+ 1 


-1 


-1 


-1 


+ 1 


-1 


-1 


+ 1 


-1 


-1 


-1 


+1 


+ 1 


+ 1 


V -1 


-1 


-1 


-1 


-1 


-1 




and 


D = 


[dij 


) = 


(di 



-1 \ 



-1 
-1 
+1 
-1 
+1 
+1 
-1 
+1 
-1 





/ 


69 


\ 






31 








55 








149 








46 








43 








118 








30 




! y = 




43 








45 








71 








380 








37 








36 








212 






V 


52 


/ 



Write y = 

{— 1,+1}^ is the j-th column vector of D 
aliasing relations defining this design, k = 2^~' holds (p 
define dgt and dgtu, l<s<t<u<p, as 



. dr,) where d,- 



id 



. dkj)' 



k is the number of runs. If there are q 
7,q = 3 for this example). We 



(disdit, • • • , dksdkt)' 



and 



{disdifdiu, . . . , dkgdktdku) 

for later use. 

The statistical model for this type of data is constructed from the theory of gener- 
alized linear models ([l3l)- Assume that the observations yi are mutually independently 
distributed with fii = E{yi), i = 1, . . . , k. The mean parameter fii is expressed as 

gil^'i) = Po + PiXii H h P^Xi^, 

where g{-) is the link function and xn, . . . ,Xiy are the v covariates defined below. The 
sufficient statistic is written as X]i=i ^ijUi- The canonical link for the Poisson distribution 
is g{ixi) = log/ii. 

Now we define covariates. We write the //-dimensional parameter (3 and the covariate 
matrix X as 

/3=(/5o,/3i,...,/5.-iy 

and 

/ 1 Xii ■ ■ ■ Xiu-l \ 



X 



( Ifc Xi 



^u-l ) 



\ 1 Xkl ■ ■ ■ Xkv-1 J 
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where 1^ = (1, . . . , 1)' is the /c- dimensional column vector. Since the likelihood function 
is written as 

k Hi / k / k 



-I-H 



k 



n 



e 



exp iP'X'y) 



the sufficient statistic for /? is X'y = (I'^^y, x'^^y, . . . , x'^^^^^y). 

The matrix X is constructed from the design matrix D to reflect the effects of the 
factors and their interactions which we intend to measure. For example, a simple model 
which only includes the main effects for each factor is given as X = (1^ D), i.e., Xj = dj 
for j = 1, . . . , z/ — 1 = p. On the other hand, we can consider a more complicated model 
containing various interaction effects, under the condition that it is consistent with the 
aliasing relations. In this example, the aliasing relation up to four-factor interactions is 
derived from as follows. 

I = ABDE = ABFG = ACDF = ACEG = BCDG = BCEF = DEFG 

: CEF 

EFG 



A = 


BDE 


= BEG 


= CDE 


= CEG, 


B = ADE = AEG 


= CDC 


C = 


ADF 


= AEG 


= BDG 


= BEE, 


D = ABE = ACE 


= BCG 


E = 


ABD 


= ACG 


= DEG, 


E = ABC = ACD = BCE 


= DEC 


G = 


: ABE 


= ACE 


= BCD 


= DEE 






AB 


= DE 


= EG = 


ABDG 


= ACDG = 


: ACEE = BCDE = 


BCEG 


AC 


= DE 


= EG = 


ABEE 


= BCDE = 


BCEG 




AD 


= BE 


= CE = 


ABCG 


= AEEG = 


BDEG = CDEG 




BC 


= DC 


= EE = 


ABDE 


= ABEG = 


ACDE = ACEG 




BD 


= AE 


= CG = 


ABCE 


= ADEG = 


BEEG = CDEE 




CD 


= AE 


= BG = 


ABCE 


= ADEG = 


: BDEE = CEEG 




AG 


= BE 


= CE = 


ABCD 


= ADEE = 


BDEG = CDEG 




ABC = ADG = AEF = BDF = BEG 


= CDE = CEG 





Subject to the above aliasing relations, we may consider appropriate models where all the 
parameters are estimable. For example, the saturated model for this example includes 
16(= k) parameters, when X is the Hadamard matrix of the order 16. One of the inter- 
pretations of the saturated model includes seven main effects, seven two-factor interaction 
effects, AB,AC,AD,AG,BC,BD,CD, and one three-factor interaction effect, ABC. We 
write this model as 

ABC/AD/BD/CD/AG/E/F 

by the manner of the hierarchical models. In this case, as in Table Ej the columns of X 
can be indexed as 

X = {Ik, di, d2, di2, da, dis, d23, di23, d^, du, d24, ds, d34, de, dy, diy). 
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Table 2: Hadamard matrix of the order 16 and two interpretations. 



+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+ 1 


+ 1 


+ 1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


+1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


+1 


+1 


+1 


+1 


-1 


-1 


-1 


-1 


+1 


+1 


+1 


+ 1 


-1 


-1 


-1 


-1 


+1 


+1 


+1 


+1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


+1 


+1 


+1 


+1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


-1 


-1 


+1 


+1 


-1 


-1 


+ 1 


+1 


+1 


+1 


-1 


-1 


-1 


-1 


+1 


+1 


+1 


+1 


-1 


-1 


-1 


-1 


+1 


+ 1 


+1 


+1 


-1 


-1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+ 1 


+1 


+1 


-1 


-1 


+1 


-1 


+ 1 


-1 


+1 


-1 


+ 1 


-1 


+ 1 


-1 


+ 1 


-1 


+ 1 


-1 


+ 1 


-1 


+1 


-1 


+1 


-1 


+1 


-1 


+1 


-1 


-1 


+1 


-1 


+ 1 


-1 


+1 


-1 


+1 


+1 


-1 


+1 


-1 


-1 


+1 


-1 


+1 


+1 


-1 


+1 


-1 


-1 


+1 


-1 


+1 


+1 


-1 


+1 


-1 


-1 


+1 


-1 


+1 


-1 


+1 


-1 


+1 


+ 1 


-1 


+1 


-1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


-1 


+1 


+ 1 


-1 


-1 


+1 


+1 


-1 


-1 


+ 1 


+1 


-1 


-1 


+1 


-1 


+1 


+1 


-1 


-1 


+1 


+1 


-1 


+1 


-1 


-1 


+ 1 


-1 


+1 


+1 


-1 


+1 


-1 


-1 


+1 


-1 


+1 


+ 1 


-1 


+1 


-1 


-1 


+ 1 


-1 


+1 


+ 1 


-1 


-1 


+1 


+1 


-1 


+ 1 


-1 


-1 


+1 


I 


A 


B 


AB 


C 


AC 


BC 


ABC 


D 


AD 


BD 


E 


CD 


F 


G 


AG 


I 


A 


B 


AB 


C 


AC 


BC 


ABC 


D 


AD 


BD 


ABD 


CD 


ACD 


BCD 


ABCD 



Note that the interpretation of the models is not unique. An extreme interpretation of 
the saturated model is given by ignoring three main effects, E, F, G, and considering all 
the higher interaction effects among A, B, C, D, which is written as 

ABCD. 

This interpretation is not realistic in the context of designed experiments, but it is useful 
for understanding Markov bases in terms of contingency tables. These two interpretations 
of the saturated model are shown in Table [H Note that the concept of identifiable model 
in the saturated model is also given in fl9]. Using the theory of Grobner basis, |19|] gives 
a method to obtain one of the maximal identifiable models for a given design. See also 
3 and 0- 



Since the saturated model cannot be tested, we consider an appropriate submodel of 
the saturated model. For the purpose of illustration we focus on the model considered in 



13(1, which is given as 

AC/BD/E/F/G, (2) 

i.e., the model of seven main effects and two two-factor interactions. We treat this model 
as the null model and consider significance tests. In this case, the null hypothesis can be 
described as follows. Permuting the columns of Table [H we partition the covariate matrix 
X of the saturated model as 

X = (Xo Xi), 

-^0 = (Ifc, di, da, dg, d4, dg, de, dy, di3, d24) = (Ik, xi, . . . , x,,_i), 

Xi = (di2, d23, di23, di4, d34, di7) = (x,,, . . . , Xfc_i), 

and consider the corresponding parameter P = (/5o,/3i, . . . ,Pk-i)- Then the submodel is 
specified in the form of a null hypothesis 

Ho : = ■ ■ ■ = Pk-i = 0. 
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Under Hq, the nuisance parameters are f3o, . . . , and the sufficient statistic for the 
nuisance parameters is X^y. Then the conditional distribution of y given the sufficient 
statistics is written as 

k 

/(y I X',y = X'.y") = C(X^y°) J] 

i=i 

where C{XQy°) is the normahzing constant determined from Xgy" and written as 

ye^(X^y°) \i=l ^'7 

and 

jF(XQy°) = {y I Xgy = X'^y", i/i is a nonnegative integer for z = 1, . . . , A;}. (5) 

Note that by sufficiency the conditional distribution does not depend on the values of the 
nuisance parameters. 

We can now consider significance tests against various alternatives to Hq. An im- 
portant alternative is to test the effect of a single additional effect. For example in the 
above example we can test presence of AB-interaction effect by considering the alterna- 
tive hypothesis Hi : 0. Or if we are interested in the goodness-of-ffi test, then the 
alternative hypothesis is Hi : {P^, . . . , Pk-i) 7^ (0, . . . ,0). Depending on the alternative 
hypothesis, we can use appropriate test statistic T{y), such as the likelihood ratio statistic 
for testing Hq against Hi. In Section 3 we give a procedure to sample from the conditional 
distribution ([3]). Therefore we can assess the conditional distribution of any test statistic 
T(y) under Hq. 

For the purpose of illustration we now consider goodness-of-fit procedures. Tradi- 
tional tests evaluate the upper probability for some discrepancy measures such as 
the deviance, the likelihood ratio or Pearson goodness-of-ffi, based on the asymptotic 
distribution, xl~u- example, the likelihood ratio statistic 

k 

ny) = G'^(y) = 2$^i/.iog? (6) 

is frequently used, where /2j is the maximum likelihood estimate for /ij under the null 
model (i.e., ffited value), given by 

fx= (64.53, 47.25,..., 51.42)' 

for our example. Then for the observed data y° = (?/°, • • • , 1/^)', ^(y'') = G^(y°) is 
calculated as T(y°) = 19.096 and the corresponding asymptotic p value is 0.0040 from 
the asymptotic distribution xi- This result tells us that the null hypothesis is highly 
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Table 3: Design and number of good parts y out of 1000 for the windshield molding 
slugging experiment 

Factor 



Run 


A 


B 


C 


D 


2/ 


1 


1 


1 


1 


1 


338 


2 


1 


1 


2 


2 


826 


3 


1 


2 


1 


1 


350 


4 


1 


2 


2 


2 


647 


5 


2 


1 


1 


2 


917 


6 


2 


1 


2 


1 


977 


7 


2 


2 


1 


2 


953 


8 


2 


2 


2 


1 


972 



significant and is rejected. Using the conditional distribution ([3]), the exact p value is 
written as 

p = 5^ /(y I X^y = X^y°)l(T(y) > T(y°)), 



where 



l(T(y) > T(y°)) = T' 

^ i\y )) I otherwise. 



Unfortunately, however, an enumeration of all the elements in jF(XQy°) and hence the 
calculation of the normalizing constant C{X'Qy°) is usually computationally infeasible for 
large sample space. Instead, we consider a Markov chain Monte Carlo method described 
in Section 3. 

2.2 Exact conditional tests for logistic models of binomial ob- 
servations. 

Next we consider the case that the observation for each run is a ratio of counts. Table 3 
is a 1/2 fraction of a full factorial design (i.e., a 2^~^ fractional factorial design) defined 
from the relation 

ACD = I (7) 



and response data given by 16|] and reanalyzed in 13|. In Table 3, the observation y 



is the number of good parts out of 1000 during the stamping process in manufacturing 



windshield modeling. The purpose of [16|] is to decide the levels for four factors, (A) poly- 
film thickness, (B) oil mixture, (C) gloves and (D) metal blanks, which most improve the 
slugging condition. Similarly to Section 2.1, we rewrite this data as 
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D 





1 1 


1 1 


-hi 1 




< OOO 1 


+1 

1 -L 


+ 1 

1 J- 


-1 


-1 




826 


+1 


-1 


+ 1 


+ 1 




350 


+1 


-1 


-1 


-1 




647 


-1 


+ 1 


+ 1 


-1 


, y = 


917 


-1 


+ 1 


-1 


+ 1 




977 


-1 


-1 


+ 1 


-1 




953 


^ -1 


-1 


-1 


+ W 




^ 972 / 



As for a statistical model for this type of data, it is natural to suppose that the distri- 
bution of the observation Ui is the mutually independent binomial distribution Bin(/ij, rij), 
i = 1, . . . ,k, where Ui = 1000, i = 1, . . . , k = 8 for this example. Following the theory of 
generalized linear models, we also consider the logit link, which is the canonical link for 
the binomial distribution. It expresses the relation between the mean parameter Hi and 
the systematic part as 



gifii) = log 



1 - /ii 



Po + Pi^ii H h P^Xi 



The covariate matrix and the corresponding parameters are defined similarly as Section 
2.1, i.e., permuting the columns of the Hadamard matrix of the order 8, we write X = 
{Xq Xi) in such a way that Xq is the covariate matrix for the appropriate null model. In 
this example, we consider the simple main-effects model, A/B/C/D, which is considered 
Tij. For this model, the covariate matrix is written as Xq = (1^ D). Similarly to 



m 



Section 2.1, we consider the conditional tests for various alternatives to this model. In 
this case, since the likelihood function is written as 



n 

i=l 



rii 
Vi 



n 

i=l 
k 

n 

i=l 



rii 
Vi 

Vi 



.1 - ^^i 



the nuisance parameters under the null hypothesis are X'^y, ni, . . . , n^. Consequently, the 
exact conditional tests are based on the conditional distribution, 

k 

fiy I X^y = X^y°, rii, . . . , n,) = C(X^y°, n,, . . . , n,) fj ^_ (8) 

where C{Xl^y", ni, . . . , Uk) is the normalizing constant determined from X'^y", rii, . . . ,nk 
and written as 



ye.F(X^y°,ni,...,nfc) \i=l ^''^ ' 



(9) 
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and 



^(X;y°, . . . , nu) = {y | X'^y = X^y°, e {0, 1, . . . , n,,}, z = 1, . . . , fc}. (10) 
For notational convenience, we extend y to 

y = (2/1, ...,yk,ni-yi,...,nk- yk)' 
for the binomial model. Corresponding to this y, we also extend u x k matrix Xq to 

Xq Ou,k 



where O^^k is the u x k zero matrix and Ik is the identity matrix of the order k. In the 
theory of the toric ideals, Xq is called the Lawrence lifting of the configuration Xq. See 



15( 1 for details. Using y and Xq , the condition that Xgy and ni, . . . , are fixed is simply 



written that Xq y is fixed. Hereafter for notational simplicity, we write y and Xq instead 

of y and Xq . Namely, to express the conditional distribution and its support for the 
binomial model, we use the expression ([3]) (jl]) ([S]) , those for the Poisson model, instead of 
(©©([lOD, respectively 



3 Markov chain Monte Carlo tests for the designed 
experiments 

In this section, we consider the Markov chain Monte Carlo methods for calculating p values 
defined in Section 2. In Section 3.1, we give an explanation of Markov chain Monte Carlo 
methods, along with the definition of Markov bases. We also describe some algorithms 
and softwares, which are useful in applications. In Section 3.2, we investigate the relation 
between the fractional factorial designs and contingency tables. 



3.1 Markov chain Monte Carlo methods for evaluating p values 

To perform the exact tests defined in Section 2, a useful approach is to generate samples 
from the conditional distribution /(y | XqJ = X^y") and calculate p values for any test 
statistic. In particular, when the closed form expression of the null distribution can not 
be obtained, a Markov chain Monte Carlo approach is a valuable tool. If a connected 
Markov chain over jF(XQy°) is constructed, the chain can be modified to give a connected 
and aperiodic Markov chain with stationary distribution /(y | X^y = X^y") by the usual 
Metropolis procedure. 

To construct a connected chain, a frequently used approach is to prepare a Markov 
basis defined in Let 7V1(Xq) be the set of integer vectors which are in the kernel of 
X^, i.e., 

A^(Xq) = {z = (zi, . . . , Zk)' I XqZ = 0, Zi is an integer for z = 1, . . . , k}, 
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where is the zero vector. We call the element in A^(Xq) a move for Xq, in the sense 
that adding z G M.{Xq) to any y does not change the sufficient statistics, i.e., 

X^(y + z) = X;y. 

An important point is that y + z can include a negative element. On the other hand, 
if y + z G J-'IXqy), i.e., y + z is still a non-negative vector, we see that y is moved to 
y + z G jF(Xoy) by z. Now we give the definition of a Markov basis. 

Definition 3.1. A Markov basis for Xq is a finite set of moves B = {zi, . . . , z^}, zj G 
7V1(Xq), j = 1, . . . , L, such that, for any y, y* G J-'^Xqy"), there exists A > 0, (ei, z^J, . . . , 
{ea, Zja) '"'^^^ ^ {^1' ^ s = 1, . . . , A, satisfying 



A 

E 



EsZj^ and y* 



a 

E 

s=l 



Esz,^ e J^iX'^y") fora = l,...,A. 



By definition, a Markov basis enables to construct a connected chain over J-'lX'^y"), 
which is modified so as to have the null distribution /(y | Xgy = Xqj") as the stationary 
distribution by the Metropolis-Hastings procedure. Therefore we can perform various 
conditional tests by the Monte Carlo sampling. We give a simple algorithm to calculate 
p values for some test statistic T(-) based on the Markov chain Monte Carlo sampling. 

Input: Markov basis B, observed data y°, covariate matrix Xq, 

size of sample N, null distribution /(■) , test statistic T(-) 
Output : p value 

Variables: obs, count, sig, y,y next 



Step 1 
Step 2 
Step 3 



Step 4 
Step 5 
Step 6 

Step 7 



obs = T{y°) . y = y" ■ count 
Choose z from B randomly. 
Select ynext from {y + nz | 



= 0. sig = 0. 

I = {n \ y + nzeJ^iX'^y")}. 
n E 1} with probability 



Pn 



f{y + nz) 
^/(y + nz) 



nel 



If T{ynext) > obs then sig = sig+ 1. 

y = ynext ■ count = COUUt + 1 . 

If count < N then Go to Step 2. 

sig 



Estimated p value is 



N 



Note that we need not calculate the normalizing constant, C(XQy°) in (jlj) or C(XQy°, 
ni, . . . , rifc) in ([9]), of the null distribution /(■), since it is canceled in the numerator and 
denominator in Step 3. 

Derivation of Markov bases is itself a very interesting problem. Markov bases can 
be very complicated and hard to compute for large models. Many works, including the 
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original work by |8|]jhave relied on the theory of computational algebra and Grobner 
bases. See [sf, [loj. On the other hand, a series of works by Aoki and Takemura 
investigates the structure of minimal Markov bases and gives some characterizations. In 
particular, Aoki and Takemura([3], 0], 0) give the expression of the minimal Markov 
bases directly (i.e., not by using algebraic algorithm) for some problems of contingency 
tables. 

In applications, it is most convenient for readers to rely on algebraic computational 
packages such as 4ti2 ([li]). For example, consider the problem we have seen in Section 
2.1. For the model ([2]), the covariate matrix Xq is a 10 x 16 matrix. To calculate the 
Markov basis for this Xq by 4ti2, we only have to prepare a datafile 



10 16 
1 
1 
1 
1 

1 - 
1 - 
1 - 
1 - 
1 

1 - 



11111111111111 

1 1 1 1 1 1-1-1-1-1-1-1 -1 -1 

1 1-1-1-1-1 1 1 1 1-1-1 -1 -1 

-1-1 1 1-1-1 1 1-1-1 1 1 -1 -1 

1-1 1-1 1-1 1-1 1-1 1-1 1-1 

1-1-1 1-1 1-1 1-1 1 1-1 1-1 

-1 1 1-1-1 1 1-1-1 1 1-1-1 1 

-1 1-1 1 1-1 1-1-1 1-1 1 1-1 

-1-1 1 1 -1 -1 -1-1 1 1-1-1 1 1 

1-1-1 1-1 1 1-1 1-1-1 1-1 1 



and run the command markov. Then the list of a minimal Markov basis is instantly given 

as 

35 16 

-1 -1 001100001100 -1 -1 
-1 -1 0111 -1 00010000 -1 
-1 -1 101 10 -1 000100 -1 

The above output shows that a minimal Markov bases for this Xq consists of 35 moves, 
which corresponds to each row. 

Using a minimal Markov basis, we perform the likelihood ratio test based on in 
Section 2.1. After 100, 000 burn-in steps, we construct 1, 000, 000 Monte Carlo samples. In 
contrast to the asymptotic p value 0.0040, the estimated p value is 0.032, with estimated 
standard deviation 0.0045, where we use a batching method to obtain an estimate of 
variance, see ij] and 2l|. Figure 1 shows a histogram of the Monte Carlo sampling 
generated from the exact conditional distribution of the likelihood ratio statistic under 
the null hypothesis, along with the corresponding asymptotic distribution xl- Figure 1 
shows that the asymptotic distribution understates the probability that the values of the 
test statistic for the samples is not less than the observed value, and overemphasizes the 
significance. 
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Figure 1: Asymptotic and Monte Carlo estimated exact distribution 



Table 4: Eight- 


run 2^-9 


fractional factorial desig 


ns 


(p-q = 3) 


Number of factors p 


Resolut 


ion Design Generators 






4 


IV 


D = ABC 






5 


III 


D = AB, E = AC 






6 


III 


D = AB, E = AC, 


F 


= BC 


7 


III 


D = AB, E = AC, 


F 


= BC, G = ABC 



3.2 Markov bases and corresponding models for 2^ contin- 
gency tables 

In this section, we investigate relationships between contingency tables and fractional 
factorial designs with 2^""^ runs. As noted in Section 1, Markov bases have been mainly 
investigated in the context of contingency tables. For example, gives an expression 
of minimal Markov bases of all the hierarchical models for 2*^ contingency tables. This 
list may be sufficient in application for the analysis of 2^ contingency tables, since the 
hierarchical model is the most natural class of models to be considered. We show in this 
section that, considering the fractional factorial designs, we encounter some new models 
and Markov bases, which do not correspond to hierarchical models of contingency tables. 



Fractional factorial designs with 8 runs. First, we consider fractional factorial de- 
signs with 8 runs, i.e., the case of p — q = 3. The most frequently used designs are listed 
in Table 4. We clarify the relationships between these designs and the models of 2^ con- 
tingency tables y = (yijk), ^ h j, k < 2, for the Poisson observations, and the models 
of 2"^ contingency tables y = {Uijki), 1 < hj, k,i < 2, for the binomial observations. 



14 



In the case of Poisson observations, we write eight observations as if they are the 
frequencies of 2^ contingency table, i.e.. 



y = (2/111, yil2, 2/121, 2/122, 2/211, 2/212, 2/221 , 2/222)'- 

In the case of p = 5, for example, the design and the observations are given as follows. 







Factor 






Run 


A 


B 


C 


D 


E 


2/ 


1 


1 


1 


1 


1 


1 


2/111 


2 


1 


1 


2 


1 


2 


2/112 


3 


1 


2 


1 


2 


2 


2/121 


4 


1 


2 


2 


2 


1 


2/122 


5 


2 


1 


1 


2 


1 


2/211 


6 


2 


1 


2 


2 


2 


2/212 


7 


2 


2 


1 


1 


2 


2/221 


8 


2 


2 


2 


1 


1 


2/222 



For this type of data, we define i/-dimensional parameter (3 and the covariate matrix X 
according to an appropriate model we consider, as explained in Section 2. First consider 
the simple main effect model A/B/C/D/E (z/ = 5). To test this model against various 
alternatives, Markov chain Monte Carlo testing procedure needs a Markov basis for the 
covariate matrix 



( +1 +1 +1 +1 +1 

+1 +1 +1 +1 -1 

+1 +1 -1 -1 +1 

+1 -1 +1 -1 +1 

+1 +1 -1 -1 -1 

1 +1 -1 



-1 +1 +1 \ 

-1 -1 -1 

-1 -1 -1 

\ +1 -1 

-1 +1 +1 



1 +1 -1 +1 / 



Note that each row of the corresponds to the sufficient statistic under the null model 
A/B/C/D/E. In this case, the sufficient statistic is given as 



2/..., 2/1-, 2/2-, 2/-1-, 2/-2-, 2/-1, 2/-2, 

2/11- +2/22-, 2/12- +2/21-, 2/11+2/2-2, 2/1-2 +2/2-1, 

where we use the notations such as 

2 2 2 2 2 2 



(12) 



i=i j=i k=i 



j=i k=i 



k=l 



Here we see that the sufficient statistic (fT2|) is nothing but a well-known sufficient statistic 
for the conditional independence model AB/AC, given as 



{Vij }, {Vi k}, k = l,2. 



(13) 
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The one-to-one relation between f|T2|) and f|T3|l is easily shown as 



2 



Vi-k = 



Hi.. + y..k - {Vi-k* + yi*-k) 
2 



(14) 



where {i, i*}, {j,j*} and {k, k*} are distinct indices, respectively. This correspondence is, 
of course, due to the aliasing relation D = AB, E = AC. 

Next consider another model. Since there are eight observations, we can estimate 
eight parameters at most (in the saturated model). Since the saturated model cannot be 
tested, let us consider the models of seven parameters, i.e., case of z/ = 6. If we restrict our 
attention to the hierarchical models, five main effects and one of the two-factor interaction 
effects, BC, BE, CD, DE, can be included in the models, since the aliasing relation is given 
as 



If our null model includes BC or DE, i.e., if our null model is written as A/BC/D/E or 
A/B/C/DE, we add the column 



to the covariate matrix Xq. In this case, the sufficient statistic under the null model 
includes y.u + y.22 and y.12 + y.2i in addition to f|T2l) . which is nothing but a well-known 
sufficient statistic for the no three-factor interaction model, AB/AC/BC, 



by the similar relations to (fT4|) . 

On the other hand, if our null model includes BE or CD, i.e., if our null model is 
written as A/BE/C/D or A/B/CD/E, we have to add the column 



to the covariate matrix Xq. In this case, the sufficient statistic under the null model 
includes ym + 7/122 + I/212 + ^221 and yiu + yui + I/211 + Z/222 in addition to ([T2]). This 
is one of the models which do not have corresponding models in the hierarchical models 
of three-way contingency tables. We write this new model as AB/AC -|- (ABC). The 
sufficient statistic for this model is 

{Vij }, {Vik}, yiii + 2/122 + 1/212 + 2/221, 2/112 + 2/121 + 2/211 + 2/222, i,j,k = 1,2. 

Similarly, we can specify the corresponding models of three-way contingency tables 
(for the factors A, B, C) to all the possible models for the designs of Table 4, as if the 
observations are the frequencies of 2^ contingency tables. The result is summarized in 



A = BD = CE, B = AD, C = AE, D = AB, E = AC, 
BC = DE, BE = CD = ABC. 



(+1 -1 -1 +1 +1 -1 -1 +1) 



{vij }, {vik}, {y-jk}, i,j,k = 1,2, 



(+1 -1 -1 +1 -1 +1 +1 -1)' 



Table 5. 
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Table 5: Eight-run 2^ fractional factorial designs and the corresponding models of three- 
way contingency tables {p — q = S) 



-L-'COl^ll • ^ J—-' — 


ABC 




7 / \ nil TTl /~\ o 1 

ly IN Ull lliUU-Cl 


V_v(Ji i copUliU-llig, lliUU-Cl Ui Z( 


Lauie 


4 A/B/C/D 


A /B/C + fABCl" 




5 AB/C/D 


AB/C + (ABC)" 




6 AB/AC/D 


AB/AC+ (ABC)'' 




Design : p = 5, D = 


AB, E = AC 




V Null model 


Corresponding model of 2^ 


table 


5 A/B/C/D/E 


AB/AC 




6 A/BC/D/E 


AB/AC/BC 




A/BE/C/D 


AB/AC + (ABC)« 




Design : p = 6, D = 


AB, E = AC, F = BC 




z/ Null model 


Corresponding model of 1? 


table 


6 A/B/C/D/E/F 


AB/AC/BC 





"The sufficient statistic for (ABC) is ym + ^122 + 2/212 + 2/221, 2/112 + 2/121 + 2/211 + 2/222- 



In the case of Binomial observations, there are 16 observations. Similarly to the 
Poisson case, we treat the observations as if they are the frequencies of 2^ contingency 
table. In the case of p = 5, for example, the design and the observations are given as 
follows. 







Factor 








Run 


A 


B 


C 


D 


E 




2/ 


1 


1 


1 


1 


1 


1 


2/1111 


2/1112 


2 


1 


1 


2 


1 


2 


2/1121 


2/1122 


3 


1 


2 


1 


2 


2 


2/1211 


2/1212 


4 


1 


2 


2 


2 


1 


2/1221 


2/1222 


5 


2 


1 


1 


2 


1 


2/2111 


2/2112 


6 


2 


1 


2 


2 


2 


2/2121 


2/2122 


7 


2 


2 


1 


1 


2 


2/2211 


2/2212 


8 


2 


2 


2 


1 


1 


2/2221 


2/2222 



For this type of data, we also specify parameter (i and the covariate matrix according 
to the appropriate models, by replacing X by X of (fTTi) . Note that the elements of y is 
ordered as 

y = (2/1111 5 2/1121 5 • • • 5 2/2211 5 2/2221 5 2/11125 2/1122) • • • 5 2/2212? 2/2222)'- 

Accordingly, correspondences to the models of 2^ contingency tables are easily obtained 
and the result is given in Table 6. 
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Table 6: Eight-run 2^ fractional factorial designs and the corresponding models of three- 
way contingency tables {p — q = S) 

Design : p = 4, D = ABC 

1/ Null model Corresponding model of 2^ table 

4 A/B/C/D AD/BD/CD + (ABC)" + (ABCD)" 

5 AB/C/D ABD/CD + (ABC)" + (ABCD)^ 

6 AB/AC/D ABD/ACD + (ABC)° + (ABCD)'' 
Design : p = 5, D = AB, E = AC 

1/ Null model Corresponding model of 2^ table 

5 A/B/C/D/E ABD/ACD + (ABC)"^ 

6 A/BC/D/E ABD/ACD/BCD/ABC 
A/BE/C/D ABD/ACD + (ABC)" + (ABCD)'' 

Design : p = 6, D = AB, E = AC, F = BC 

1/ Null model Corresponding model of 2^ table 

6 A/B/C/D/E/F ABD/ACD/BCD/ABC 

"The sufficient statistic for (ABC) is {Vijk }, hj,k = 1,2. 

*The sufficient statistic for (ABCD) is yuu + yi22e + y2i2e + y22U, 

yu2£ + yi2u + y2iu + 1/222^ ^ = 1,2. 

Table 6 is automatically converted from Table 5 as follows. By definition, D is added 
to all the generating sets. Note also that the sufficient statistic for each model includes 
{yijk-}, ^ ^ i, j, k < 2, by definition, which yields Table 6. Therefore the models which do 
not include all of AB, AC and BC do not correspond to hierarchical models. 

Fractional factorial designs with 16 runs. Next we consider fractional factorial 
designs with 16 runs, i.e., the case of p — q = 4. Table 7 is a list of sixteen-run 2^""^ 
fractional factorial designs {p — q = 4:,p < 10) from Section 4 of |25]. By the similar 
considerations to the 8 run cases, we can seek the corresponding models of 2^ contingency 
tables for the Poisson observations, and models of 2^ contingency tables for the Binomial 
observations. Since the modeling for Binomial observations can be easily obtained from 
the Poisson case as we have seen, we only consider the Poisson case here. 

Since at most sixteen parameters are estimable for the sixteen-run designs, we can 
consider various models of main effects and interaction effects. For example, the satu- 
rated model of the p = 5 design, E = ABCD, can include all the main and two-factor 
interactions, 

AB /AC /AD / AE /BC /BD /BE/CD /CE /DE. 

Note that for the models of p = 5, 6, 7, 8 in Table 7, each main effect and two-factor 
interaction is estimable. (On the other hand, for the resolution III models of p = 9, 10, 
some of the two-factor interactions are not estimable.) Among the models which include 
all the main effects and some of the two-factor interaction effects, some models have 
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Table 7: Sixteen-run 2^ fractional factorial designs {p — q = 4^) 



Number of factors p Resolution Design Generators 



5 


V 


E = 


ABCD 






6 


IV 


E = 


ABC,F = 


ABD 




7 


IV 


E = 


ABC,F = 


ABD, G = 


ACD 


8 


IV 


E = 


ABC,F = 


ABD, G = 


ACD 






H = 


BCD 






9 


III 


E = 


ABC,F = 


ABD, G = 


ACD 






H = 


BCD, J = 


ABCD 




10 


III 


E = 


ABC,F = 


ABD, G = 


ACD 






H = 


BCD, J = 


ABCD,K 


= CD 



the corresponding hierarchical model in the 2^ contingency tables if we write the sixteen 
observations as y = {yijk£},i,j,k,£ = 1,2. For example, for the p = 6 design of E = 
ABC, F = ABD, the model of 6 main effects and 5 two-factor interaction effects, 

AB /AC /AD /BC /BD /E/F, 

has a corresponding model of ABC/ ABD in the 2^ contingency tables. By the aliasing 
relation 

AB = CE = DF, AC = BE, AD = BE, AE = BC, 
AF = BD, CD = EE, CF = DE = ABCD, 

it is seen that there are 3 ■ 2 ■ 2 ■ 2 • 2 = 48 distinct models such as 

AB /AC /AD / AE /AF /E/F, 
AB /AC/AD /AE/BD/E/F, 
AB /AC /AD /BC /AF/E /F, 
AB /AC /AD /BC /BD /E/F, 

DF/BE/BF/BC/AF/E/F, 
DF/BE/BF/BC/BD/E/F, 

which correspond to the model of ABC/ ABD in the 2'^ contingency tables. By the similar 
considerations, we can specify all the models for the designs of Table 7 which correspond 
some hierarchical models in the 2^ contingency tables. The result is shown in Table 8. 

One of the merits to specify corresponding hierarchical models of contingency tables is 
a possibility to make use of general already known results on Markov bases of contingency 
tables. For example, shows that a Markov basis can be constructed by the primitive 



moves, i.e., degree 2 moves, for the decomposable graphical models in the contingency 
tables. In our designed experiments, therefore, Markov basis for the models which corre- 
spond to decomposable graphical models of contingency tables can be constructed only 
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Table 8: Sixteen-run 2^ fractional factorial designs and the corresponding hierarchical 
models of 2^ contingency tables (p — = 4) 



Lycblg,li . jL/ — U, H/ — iA±_) , J? — Jr\ULJ 






7/ PTlTPCiPri't" "1"T\7'P 
J- IJl CoClit CIj tl V C 


Num. of the 


Corresponding hierarchical 


mill modpl 


null models 


model of 2^ table 


11 AB/AC/AD/BC/BD/E/F 


48 


ABC/ABD 


12 AB/AC/AD/BC/BD/CD/E/F 


96 


ABC/ABD/CD 


Design : p = 7,E = ABC, F = ABD, G 


= ACD 




u Representative 


Num. of the 


Corresponding hierarchical 


null model 


null models 


model of 2^ table 


11 AB/AC/AD/BC/BD/CD/E/F/G 


3" = 729 


ABC/ABD/ACD 


Design : p = 8, E = ABC, F = ABD, G 


= ACD, H = BCD 


v Representative 


Num. of the 


Corresponding hierarchical 


null model 


null models 


model of 2^ table 


11 AB/AC/AD/BC/BD/CD/E/F/G 


46 = 4096 


ABC/ABD/ACD/BCD 



by the primitive moves. Among the results of Table 5,6 and 8, there are two models 
which corresponds to decomposable graphical models in the contingency tables. We can 
confirm that minimal Markov bases for these models consist of primitive moves as follows. 
We use the following notational convention for a move z. Consider 2^ case z = If 
= +1, ^isjaks — ^iAjAki — +1, and Other elements are zeros, then we denote 

z as 

{iljlh){i2j2k2) - («3i3^3)(«4j4^4)- 

Similar notation is used for 2^ case. 

• 2^~^ fractional factorial design of D = AB, E = AC: 

The main effects model A/B/C/D/E corresponds to the decomposable graphical 
model AB/AC of the 2^ contingency tables. This is a conditional independence 
model between B and C given A and a minimal Markov basis is constructed by 
primitive moves as 

(111)(122) - (112)(121), (211)(222) - (212)(221). 

• 2^-^ fractional factorial design of E = ABC, F = ABD: 

The model AB/AC/AD/BC/BD/E/F corresponds to the decomposable graphical 
model ABC/ABD of the 2^ contingency tables. This is a conditional independence 
model between C and D given {A, B} and a minimal Markov basis is again con- 
structed by primitive moves as 

(1111)(1122) - (1112)(1121), (1211)(1222) - (1212)(1221), 
(2111)(2122) - (2112)(2121), (2211)(2222) - (2212)(2221). 
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For the other designs of Table 7 (p = 5,9, 10), all the models include the sufficient 
statistic 

Z/llll + ?/ll22 + 1/1212 + 2/1221 + 1/2112 + 2/2121 + 2/2211 + 2/2222, 
2/1112 + 2/1121 + 2/1211 + 2/1222 + 2/2111 + 2/2122 + 2/2212 + 2/2221, 

and therefore have no corresponding hierarchical models in the 2'^ contingency tables. For 
example, the sufficient statistic of the the main effect models for 2^~^ design of E = ABCD 
is 

{2/i-}, {2/j-}, {y-k}, {2/-J, ij,k,i = l,2, 

2/1111 + 2/1122 + 2/1212 + 2/1221 + 2/2112 + 2/2121 + 2/2211 + 2/2222, 
2/1112 + 2/1121 + 2/1211 + 2/1222 + 2/2111 + 2/2122 + 2/2212 + 2/2221, 

and the sufficient statistic of the main effect models for 2^°^^ design of 

E = ABC, F = ABD, G = ACD, H = BCD, J = ABCD, K = CD 

is 

{Vijk-}, {Vij-e}^ {Vike}, {y-jki}, i,j,kj = 1,2, 

2/1111 + 2/1122 + 2/1212 + 2/1221 + 2/2112 + 2/2121 + 2/2211 + 2/2222, 
2/1112 + 2/1121 + 2/1211 + 2/1222 + 2/2111 + 2/2122 + 2/2212 + 2/2221- 



4 Discussion 



In this paper, we consider Markov chain Monte Carlo tests for the factor effects in the 
designed experiments. As is noted in Section 1, Markov chain Monte Carlo procedure is 
a valuable tool when the adequacy of traditional large-sample tests is doubtful and the 
enumeration of the conditional sample space is infeasible. Since a closed form expression 
of the null distribution for the conditional tests considered in Section 2 is not available 
in general, Markov chain Monte Carlo procedure is valuable in the settings of this paper. 
Computational experience given in Section 3.1 shows efficacy of our method. 

To perform Markov chain Monte Carlo tests, it is often problematic to calculate a 
Markov basis. Current algorithms may take a very long time to compute Markov basis 
for 64 run or larger experiments. For the designs of 16 or 32 runs (for the Poisson models), 
a software such as 4ti2 works quite well and very practical. Nevertheless, the arguments 
and theoretical considerations in Section 3.2 seem important. One of the merits to specify 
the corresponding models of 2^ contingency tables is a possibility to make use of general 
results for the Markov bases of contingency tables as shown in Section 3.2. 

It is also important to consider more complicated designs and give appropriate Markov 
bases for them, such as designs with three levels or balanced incomplete block designs. 

The designed experiment is one of the areas in statistics where the applications of 



the theory of Grobner basis are ffist considered. See the works [19|, [22[ and [20[. In 
these works, the design is represented as the variety for the set of polynomial equations. 
On the other hand, this manuscript gives another application of Grobner basis theory to 
the designed experiments by considering Markov chain Monte Carlo tests for a discrete 
response variable. 
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